FigureQA: An Annotated Figure Dataset for Visual Reasoning

نویسندگان

  • Samira Ebrahimi Kahou
  • Adam Atkinson
  • Vincent Michalski
  • Ákos Kádár
  • Adam Trischler
  • Yoshua Bengio
چکیده

We introduce FigureQA, a visual reasoning corpus of over one million questionanswer pairs grounded in over 100, 000 images. The images are synthetic, scientific-style figures from five classes: line plots, dot-line plots, vertical and horizontal bar graphs, and pie charts. We formulate our reasoning task by generating questions from 15 templates; questions concern various relationships between plot elements and examine characteristics like the maximum, the minimum, area-under-the-curve, smoothness, and intersection. To resolve, such questions often require reference to multiple plot elements and synthesis of information distributed spatially throughout a figure. To facilitate the training of machine learning systems, the corpus also includes side data that can be used to formulate auxiliary objectives. In particular, we provide the numerical data used to generate each figure as well as bounding-box annotations for all plot elements. We study the proposed visual reasoning task by training several models, including the recently proposed Relation Network as a strong baseline. Preliminary results indicate that the task poses a significant machine learning challenge. We envision FigureQA as a first step towards developing models that can intuitively recognize patterns from visual representations of data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Supplementary material: Spatio-temporal Person Retrieval via Natural Language Queries

In this section, we provide the further details of the dataset statistics. The description length. We first analyze the description length (i.e., the number of words in a description). Figure 1 shows the distribution of the number of words in a description. We can see that our dataset contains various lengths of descriptions. The average length of descriptions in our dataset is 13.1. We also sh...

متن کامل

Annotating Derivations: A New Evaluation Strategy and Dataset for Algebra Word Problems

We propose a new evaluation for automatic solvers for algebra word problems, which can identify reasoning mistakes that existing evaluations overlook. Our proposal is to use derivations for evaluations, which reflect the reasoning process of the solver by explaining how the equation system was constructed. We accomplish this by developing an algorithm for checking the equivalence between two de...

متن کامل

Comparison of Moral Reasoning among Students with and without Visual Impairment

Background and Purpose: Some research has examined the moral reasoning and judgment in students with special needs and has shown that these students are lagging behind their non-disabled counterparts in term of moral development. Very few studies have been done in the area of development of moral reasoning in individuals with visual impairment; so given the research vacuum in this context, the ...

متن کامل

A dataset and architecture for visual reasoning with a working memory

A vexing problem in artificial intelligence is reasoning about events that occur in complex, changing visual stimuli such as in video analysis or game play. Inspired by a rich tradition of visual reasoning and memory in cognitive psychology and neuroscience, we developed an artificial, configurable visual question and answer dataset (COG) to parallel experiments in humans and animals. COG is mu...

متن کامل

Moments in Time Dataset: one million videos for event understanding

We present the Moments in Time Dataset, a large-scale human-annotated collection of one million short videos corresponding to dynamic events unfolding within three seconds. Modeling the spatial-audio-temporal dynamics even for actions occurring in 3 second videos poses many challenges: meaningful events do not include only people, but also objects, animals, and natural phenomena; visual and aud...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1710.07300  شماره 

صفحات  -

تاریخ انتشار 2017